recurrent structure
Distil-xLSTM: Learning Attention Mechanisms through Recurrent Structures
Thiombiano, Abdoul Majid O., Hnich, Brahim, Mrad, Ali Ben, Mkaouer, Mohamed Wiem
The current era of Natural Language Processing (NLP) is dominated by Transformer models. However, novel architectures relying on recurrent mechanisms, such as xLSTM and Mamba, have been proposed as alternatives to attention-based models. Although computation is done differently than with the attention mechanism mechanism, these recurrent models yield good results and sometimes even outperform state-of-the-art attention-based models. In this work, we propose Distil-xLSTM, an xLSTM-based Small Language Model (SLM) trained by distilling knowledge from a Large Language Model (LLM) that shows promising results while being compute and scale efficient. Our Distil-xLSTM focuses on approximating a transformer-based model attention parametrization using its recurrent sequence mixing components and shows good results with minimal training.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Lan, Disen, Sun, Weigao, Hu, Jiaxi, Du, Jusen, Cheng, Yu
Transformers with linear recurrent modeling offer linear-time training and constant-memory inference. Despite their demonstrated efficiency and performance, pretraining such non-standard architectures from scratch remains costly and risky. The linearization of large language models (LLMs) transforms pretrained standard models into linear recurrent structures, enabling more efficient deployment. However, current linearization methods typically introduce additional feature map modules that require extensive fine-tuning and overlook the gating mechanisms used in state-of-the-art linear recurrent models. To address these issues, this paper presents Liger, short for Linearizing LLMs to gated recurrent structures. Liger is a novel approach for converting pretrained LLMs into gated linear recurrent models without adding extra parameters. It repurposes the pretrained key matrix weights to construct diverse gating mechanisms, facilitating the formation of various gated recurrent structures while avoiding the need to train additional components from scratch. Using lightweight fine-tuning with Low-Rank Adaptation (LoRA), Liger restores the performance of the linearized gated recurrent models to match that of the original LLMs. Additionally, we introduce Liger Attention, an intra-layer hybrid attention mechanism, which significantly recovers 93\% of the Transformer-based LLM at 0.02\% pre-training tokens during the linearization process, achieving competitive results across multiple benchmarks, as validated on models ranging from 1B to 8B parameters. Code is available at https://github.com/OpenSparseLLMs/Linearization.
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.48)
MSegRNN:Enhanced SegRNN Model with Mamba for Long-Term Time Series Forecasting
Zhao, GaoXiang, Wang, XiaoQiang
The field of long-term time series forecasting demands handling extensive look-back windows and long-range prediction steps, posing significant challenges for RNN-based methodologies. Among these, SegRNN, a robust RNN-driven model, has gained considerable attention in LTSF analysis for achieving state-of-the-art results while maintaining a remarkably streamlined architecture. Concurrently, the Mamba structure has demonstrated its advantages in small to medium-sized models due to its capability for information selection. This study introduces a variant of SegRNN that preprocesses information using a fine-tuned single-layer Mamba structure. Additionally, it incorporates implicit segmentation and residual structures into the model's encoding section to further reduce the inherent data iterative cycles of RNN architectures and implicitly integrate inter-channel correlations. This variant, named MSegRNN, utilizes the Mamba structure to select useful information, resulting in a transformed sequence. The linear-strategy-adapted derivative retains the superior memory efficiency of the original SegRNN while demonstrating enhanced performance. Empirical evaluations on real-world LTSF datasets demonstrate the superior performance of our model, thereby contributing to the advancement of LTSF methodologies.
- Asia > Middle East > Iran (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
Mining Frequent Structures in Conceptual Models
Fumagalli, Mattia, Sales, Tiago Prince, Barcelos, Pedro Paulo F., Micale, Giovanni, Zaytsev, Vadim, Calvanese, Diego, Guizzardi, Giancarlo
The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of creating models. The undeniable value of using patterns in conceptual modeling was demonstrated in several experimental studies. However, discovering patterns in conceptual models is widely recognized as a highly complex task and a systematic solution to pattern identification is currently lacking. In this paper, we propose a general approach to the problem of discovering frequent structures, as they occur in conceptual modeling languages. As proof of concept for our scientific contribution, we provide an implementation of the approach, by focusing on UML class diagrams, in particular OntoUML models. This implementation comprises an exploratory tool, which, through the combination of a frequent subgraph mining algorithm and graph manipulation techniques, can process multiple conceptual models and discover recurrent structures according to multiple criteria. The primary objective is to offer a support facility for language engineers. This can be employed to leverage both good and bad modeling practices, to evolve and maintain the conceptual modeling language, and to promote the reuse of encoded experience in designing better models with the given language.
- Europe > Netherlands (0.04)
- Europe > Italy (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Practical Edge Detection via Robust Collaborative Learning
Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve the goal, two key issues should be concerned: 1) How to liberate deep edge models from inefficient pre-trained backbones that are leveraged by most existing deep learning methods, for saving the computational cost and cutting the model size; and 2) How to mitigate the negative influence from noisy or even wrong labels in training data, which widely exist in edge detection due to the subjectivity and ambiguity of annotators, for the robustness and accuracy. In this paper, we attempt to simultaneously address the above problems via developing a collaborative learning based model, termed PEdger. The principle behind our PEdger is that, the information learned from different training moments and heterogeneous (recurrent and non recurrent in this work) architectures, can be assembled to explore robust knowledge against noisy annotations, even without the help of pre-training on extra data. Extensive ablation studies together with quantitative and qualitative experimental comparisons on the BSDS500 and NYUD datasets are conducted to verify the effectiveness of our design, and demonstrate its superiority over other competitors in terms of accuracy, speed, and model size. Codes can be found at https://github.co/ForawardStar/PEdger.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.05)
- Asia > China > Tianjin Province > Tianjin (0.05)
- North America > United States > New York > New York County > New York City (0.04)